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Claims 



What is claimed is: 

1 L A method for creating a description of a document of a remote network data 

2 source for later identification of the document, comprising: 

f;i 3 (a) receiving information from a user about a document on a remote network data 

^! 4 site; 

yp 

=C 5 (b) creating a document identifier based on the user-input information, wherein the 

FU 

fi i 6 document identifier identifies the particular document; 

y ; 7 (c) retrieving a markup language description defining properties of elements of a 

s 8 document in a markup language; 

J?J 9 (d) analyzing the document and the content of the document utilizing the document 
W 1 0 identifier and the markup language description; 

1 1 (e) generating a description of the document based on the analysis; and 

12 (f) storing the document description. 

1 2. The method as recited in claim 1 , wherein information received from the user 

2 includes at least one of: an identification of content of interest in the document, 

3 guidelines for recognizing a document, and guidelines for recognizing content 

4 elements of interest. 

1 3. The method as recited in claim 1, wherein the document description contains a 

2 list of elements of interest and element properties for the elements of interest. 



CLIC1P016 



-56- 



The method as recited in claim 1, wherein the analysis of the content is for 
identifying elements of interest of the content of the document. 

The method as recited in claim 4, wherein the markup language description is 
used to identify properties of each of the elements of interest. 

The method as recited in claim 5, wherein the elements of interest of the content 
are identified based on properties of each element. 

The method as recited in claim 1, wherein the document analysis includes 
comparing the document to at least one other document, wherein the document 
description is modified to reflect at least one difference between the documents. 

The method as recited in claim 1, further comprising comparing the document to 
at least one other document, wherein document descriptions of each of the 
documents are modified to reflect at least one difference between the 
documents. 

The method as recited in claim 1, wherein the document is modified, wherein 
the document identifier is modified, wherein the modified document is analyzed 
for modifying the document description. 

The method as recited in claim 9, wherein the document analysis includes 
comparing the modified document to at least one other document, wherein the 
document description is modified to reflect at least one difference between the 
documents. 

The method as recited in claim 1, wherein the method is performed during 
creation of a transaction pattern. 
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1 


12. 


A computer program product for creating a description of a document of a 


2 




remote network data source for later identification of the document comnrising* 


3 


(a) 


computer code for receiving information from a user about a document on a 


4 




remote network data site; 


5 


(b) 


computer code for creating a document identifier based on the user-input 


6 




information, wherein the document identifier identifies the particular document; 


7 


(c) 


comnuter code for retrieving a markun language descrintion defining nrnnerries 


8 




of elements of a document in a markup language; 


9 


(d) 


computer code for analyzing the document and the content of the document 


O 10 




utilizing the document identifier and the markup language description; 


11 


Ce) 


comnuter code for generating a descrintion of the document ha^ed on the 


^ 12 




analvsis* and 


hj 13 


(f) 


computer code for storing the document description. 


1 l 


13. 


The computer program product as recited in claim 12, wherein information 


CO 2 




received from the user includes at least one of: an identification of content of 


m 3 




interest in the document guidelines for recognizing a document and guidelines 


M 4 




for recognizing content elements of interest. 


1 


14. 


The comnuter nrogram nroduct as recited in claim 1 2 wherein the document 

1 llv WJL1J.L/ 14I>V1 Ul ^ U. Uill UlvUUWl U>L? 1 VVllvU 111 V J. CV11 1 1 ± jL* * VV llvl will LI IV Uvv U.X11 will* 


2 




description contains a list of elements of interest and element properties for the 


3 




elements of interest. 


1 
1 




The computer program product as recited in claim 12, wherein the analysis of 


2 




the content is for identifying elements of interest of the content of the document. 


1 


16. 


The computer program product as recited in claim 12, wherein the document 


2 




analysis includes comparing the document to at least one other document, 
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wherein the document description is modified to reflect at least one difference 
between the documents. 

The computer program product as recited in claim 12, further comprising 
computer code for comparing the document to at least one other document, 
wherein document descriptions of each of the documents are modified to reflect 
at least one difference between the documents. 

The computer program product as recited in claim 12, wherein the computer 
program is executed during creation of a transaction pattern. 

A system for creating a description of a document of a remote network data 
source for later identification of the document, comprising: 
logic for receiving information from a user about a document on a remote 
network data site; 

logic for creating a document identifier based on the user-input information, 
wherein the document identifier identifies the particular document; 
logic for retrieving a markup language description defining properties of 
elements of a document in a markup language; 

logic for analyzing the document and the content of the document utilizing the 
document identifier and the markup language description; 
logic for generating a description of the document based on the analysis; and 
logic for storing the document description. 

A method for creating a description of content of a remote network data source 
for later identification of the content, comprising: 

receiving information from a user about content on a remote network data site; 
creating a content identifier based on the user-input information, wherein the 
content identifier identifies the particular content; 
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6 


(c) 


retrieving a markup language description defining properties of elements of the 


7 




content in a markup language; 


8 


(d) 


analyzing the content utilizing the content identifier and the markup language 


9 




description; 


10 


(e) 


generating a description of the content based on the analysis; and 


11 


(f) 


storing the content description. 


1 


21. 


The method as recited in claim 20, wherein information received from the user 


2 




includes at least one of: an identification of content elements of interest, 


3 




guidelines for recognizing content, and guidelines for recognizing content 


4 




elements of interest. 


1 


22. 


The method as recited in claim 20, wherein the content description contains a 


2 




list of elements of interest and element properties for the elements of interest. 


1 


23. 


The method as recited in claim 20, wherein the content is a document. 


1 


24. 


The method as recited in claim 23, wherein a description of content items of the 


2 




document is stored. 


1 


25. 


A method for identifying a document, comprising: 


2 


fa) 


receiving a document; 


3 


(b) 


receiving document descriptions of several documents; 


4 


(c) 

V J 


comparing the document descriptions with the document; 


<5 


W 


^alUUlallllg a UUC-lulIClll ICL/UgllltiUJU 1UI Cadi Ul IIIC UUtUIIlcIlL CieSCIipilOnS 


6 




based on a likelihood that the document description matches the document; 


7 


(e) 


selecting a document description based at least in part on the document 


8 




recognition scores; and 


9 


(f) 


identifying the document based on the selected document description. 
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The method as recited in claim 25, wherein the document recognition score is 
based at least in part on recognizing properties of elements of the documents in 
the document descriptions. 

The method as recited in claim 26, wherein each of the properties is given a 
weight. 

The method as recited in claim 27, wherein the weights are normalized. 

The method as recited in claim 28, wherein selected elements of the document 
are each given a content recognition score, wherein the content recognition score 
is a weighted sum of values returned by a property evaluation function weighted 
with the normalized weight of the property, wherein the content recognition 
scores are used to determine whether each content element is present. 

The method as recited in claim 29, wherein the document recognition score for 

N 

each document description is calculated using the formula S k =^Tp i R i , 

wherein N is a number of elements of interest in the document,/?/ is the presence 
weight of element Z, and R f is a function of the content recognition score for 
element z. 

The method as recited in claim 25, wherein the selection of the document is 
based on the document recognition scores and deviation, wherein the deviation 
is computed from the document recognition scores. 

The method as recited in claim 31, wherein a document description with a high 
document recognition score relative to other candidate document descriptions 
and a deviation above a predetermined threshold is selected. 
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1 33. The method as recited in claim 3 1 , wherein a document description with a low 

2 document recognition score relative to other candidate document descriptions 

3 and a deviation above a predetermined threshold is selected. 

1 34. The method as recited in claim 3 1 , wherein the deviation is calculated using the 



2 formula d recognition 



(k-\ x T j Y 1 



, where S$ is the recognition 



M I S i S k | l=k+l \S t -S k \j 

3 score for document z, k is the index of the matched document, and T is the 

4 number of candidate documents. 

1 35. The method as recited in claim 25, further comprising pruning for reducing 

2 processing. 

1 36. The method as recited in claim 25, further comprising retrieving portions of the 

2 document. 

1 37. The method as recited in claim 36, wherein the portion is retrieved using a 

2 content identifier pre-associated with the portion. 

1 38. The method as recited in claim 25, wherein the method is performed during 

2 replay of a transaction pattern. 



1 39. The method as recited in claim 25, wherein a hint is received, wherein the hint 

2 indicates that one document description is more likely to match the document 

3 than another document description. 

1 40. The method as recited in claim 38, wherein the hint includes an order of 

2 fc processing by which one document description is processed in respect to other 

3 documents descriptions. 
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The method as recited in claim 38, wherein the hint includes a hint threshold, 
wherein the hint threshold is a value for determining when a document 
description matches the document. 

The method as recited in claim 38, wherein the hint includes an order of 
processing by which one document description is processed in respect to other 
documents descriptions, and a hint threshold, wherein the hint threshold is a 
value that tells the algorithm when the document is matched. 

A computer program product for identifying a document, comprising: 
computer code for receiving a document; 

computer code for receiving document descriptions of several documents; 
computer code for comparing the document descriptions with the document; 
computer code for calculating a document recognition score for each of the 
fU 6 document descriptions based on a likelihood that the document description 

matches the document; 

computer code for selecting a document description based at least in part on the 
document recognition scores; and 

computer code for identifying the document based on the selected document 
description. 

A method for identifying content, comprising: 
receiving several content elements; 

receiving a content description of a desired content element; 
comparing the content description with the received content elements; 
calculating a content recognition score for each of the content elements based on 
a likelihood that the content description matches the content element; and 
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1 


41. 


2 




3 




1 


42. 


2 




3 




4 




1 

i 


43 


2 


(a) 


3 


(b) 


4 

*"T 


VW 


5 


(d) 


6 




7 




8 


(e) 
\ c ) 


9 




10 


(f) 


ii 




1 


44. 


2 


(a) 


3 


(b) 


4 


(c) 


5 


(d) 


6 
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selecting a matching content based at least in part on the content recognition 
scores. 

A method for creating a description of a document of a remote network data 

source for later identification of the document, comprising: 

receiving information from a user about a document on a remote network data 

site, wherein the information received from the user includes at least one of: an 

identification of content of interest in the document, guidelines for recognizing a 

document, and guidelines for recognizing content elements of interest; 

creating a document identifier based on the user-input information, wherein the 

document identifier identifies the particular document; 

retrieving a markup language description defining properties of elements of a 

document in a markup language; 

comparing the document to at least one other document utilizing the document 
identifier and the markup language description; 

analyzing the content of the document utilizing the document identifier and the 
markup language description for identifying elements of interest of the content 
of the document; 

generating a description of the document based on the comparison and analysis, 
wherein the document description contains a list of the elements of interest and 
element properties for the elements of interest, wherein the document 
description reflects at least one difference between the document and the at least 
one other document; and 
storing the document description. 



